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ABSTRACT 

Context. The Planck satellite was launched in 2009 by the European Space Agency to study the properties of the cosmic microwave 
background (CMB). An expected result of the Planck data analysis is the distinction of the various contaminants of the CMB signal. 
Among these contaminants is the Sunyaev-Zel'dovich (SZ) effect, which is caused by the inverse Compton scattering of CMB photons 
by high energy electrons in the intracluster medium of galaxy clusters. 

Aims. We modify a public version of the JADE (Joint Approximate Diagonalization of Eigenmatrices) algorithm, to deal with noisy 
data, and then use this algorithm as a tool to search for SZ clusters in two simulated datasets. 

Methods. The first dataset is composed of simple "homemade" simulations and the second of full sky simulations of high angular 
resolution, available at the LAMBDA (Legacy Archive for Microwave Background Data Analysis) website. The process of component 
separation can be summarized in four main steps: (1) pre-processing based on wavelet analysis, which performs an initial cleaning 
(denoising) of data to minimize the noise level; (2) the separation of the components (emissions) by JADE; (3) the calibration of the 
recovered SZ map; and (4) the identification of the positions and intensities of the clusters using the SExtractor software. 
Results. The results show that our JADE-based algorithm is effective in identifying the position and intensity of the SZ clusters, with 
the purities being higher then 90% for the extracted "catalogues". This value changes slightly according to the characteristics of noise 
and the number of components included in the input maps. 

Conclusions. The main highlight of our developed work is the effective recovery rate of SZ sources from noisy data, with no a priori 
assumptions. This powerful algorithm can be easily implemented and become an interesting complementary option to the "matched 
filter" algorithm (hereafter MF) widely used in SZ data analysis. 

Key words. Galaxy Clusters - Simulations - Independent Component Analysis - Blind Separation. 



1. Introduction 

During the passage of the cosmic microwave background (CMB) 
radiation through clusters of galaxies, about 1% of the pho- 
tons are Compton scattered by energetic electrons in the in- 
tracluster medium. This process causes a very distinctive sig- 
nature in the CMB spectr um, that was first described by 
ISunvaev & ZeldovichI (Il969h . 

The Sunyaev Zel'dovich (SZ) effect is a secondary CMB 
anisotropy, meaning that it was produced after the decoupling 
era. Its angular size is of the order of arc minutes and an aver- 
age intensity of a few hundred /iK, which is difficult to sepa- 
rate from the primary CMB signal and therefore difficult to de- 
tect. However, some currently operating ground-based experi- 
ments, such as the South Pole Telescope (SPT) and the Atacama 
Cosmology Telescope (ACT), have sufficiently high sensitivities 
to measure the SZ effect with high signal-to-noise ratio data and 
enough angular resolution to obtain a very ac curate SZ profile 
from the observed clusters (ISehgal et al.ll201 ll) . 

Together with current optical and X-ray surveys, SZ 
measurements are expected to produce cluster images with 
the highest pos sible sensitivities across significant frac- 
tions of the skv (ICarlstrometal.1120111: iMarriage etalJl20lU 
iPIanck Collaborationll201 ll) . The multiwavelength data will be 
used to shed light on the cluster physics, to improve our knowl- 
edge of scaling relations, and to produce catalogues to be used 
in cosmological studies. Measurements of the SZ effect offers 



a unique and powerful tool to test cosmological models and 
put st rong constraints on the param eters describing the universe 
(e.g.. rVbill200l lAUen et all l20Tlh . In addition to the SZ ef- 
fect, the hot intracluster gas is also characterized by its strong 
bremsstrahlung emission in the X-ray band. Put together, both 
effects can be used to estimate the distance of clusters and the 
Hubble constant. In addition, the SZ effect can also be used to es- 
timate the Q.bIO.m ratio and the peculiar velocity of clusters. One 
can also use large SZ s urveys to constrain the dark energy equa- 
tion o f state (see, e.g.. iBirkinshawll 19991: fCarlstrom et al.ll2000l 
I2002h . 

A full sky survey is being conducted by the Planck satellite, 
launched in 2009 and the first mission of the European Space 
Agency (ESA) dedicated to CMB studies. In January 2011, the 
Planck te am released the first version of its full-sky SZ cluster 
catalogue jPIanck Collaborationl201 lb . These results are already 
being used to study the CMB contamination on angular scales 
smaller than a few arcminutes {( > 1000), where the SZ effect, 
together with radio and sub-mm p oint sources, domin ate over 
the primary CMB contribution (e.g.. lTaburet et al.]|2010h . 

A number of algorithms have been used to extract SZ 
signal from CMB maps, but most use a priori assumptions 
about the SZ signal contained in the input maps and iden- 
tify the "unknown" clusters based upon spectral identifica- 
tion and information about shape, intensity, etc (se e, e.g 
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The main purpose of this work is to present a method to iden- 
tify SZ clusters in CMB maps, using a minimal set of a priori 
conditions. To do this, we developed a ''blind search" method, 
based only on the spectral contributions of input signals, that has 
performed very well in simulated sky maps with include many 
of the public Planck satellite characteristics, such as the asym- 
metric sky coverage, detector noise level, frequency coverage, 
etc. 

The outline of this paper is as follows: in Section |2] we 
briefly describe the SZ effect theory and the pressure profile de- 
scribed by M. Arnaud and collaborators. Section [3]contains the 
details of the two datasets used in this work, one composed of 
"home made" simulations and another produced bv lSehgal et al.l 
(I2OIOI) . The methodology used to identify SZ clusters is dis- 
cussed in Section |3] Section |5] summarizes our results and our 
concluding remarks are presented in Section|6] 

2. The Sunyaev Zel'dovich effect 

The SZ effect produces a small distortion in the CMB spectrum, 
with a temperature variation hJsz given by 



77 = f{x)y - tA 

CMB \ C 



(1) 



The first term in Eq. [T] corresponds to the distortion caused by 
the thermal distribution of electrons located in the intraclus- 
ter medium that scatter the CMB photons. The Comptonization 

parameter y is given by y = J jcrj/ig^^/, where crj- is the 

Thomson cross-section, n^ the electron density, dl the line ele- 
ment along the line of sight, and f{x) the frequency dependence 
given by 



+ 1 



1 



(2) 



where x - hv/ksTcMB and 6sz{x, Tg) is the relativistic correc- 
tion. 

The second term in Eq. [1] the so-called kinetic SZ effect, 
refers to the spectral distortion caused by the movement of the 
cluster relative to the CMB radiation. It is caused by the cluster 
speed, which creates a Doppler distortion of the scattered pho- 
tons, with Tf being the optical depth, Vp^f the speed of the cluster 
towards the line of sight, and c the speed of light. This work 
considers both the therm al and kinetic contri butions in the syn- 
thetic maps produced by ISehgal et al.l (l20IOl) and only the ther- 
mal contribution in our own simulations, since the thermal effect 
is usually at least one order of magnitude larger than the kinetic 
one. 

Equation[T]can be rewritten to take into account the variation 
in the Comptonization parameter y as a function of the radial 
coordinate of the pro jected cluster, following the discussion in 
iKomatsu etal] ( l201 Ih 



AT: 



SZ 



TcMB 



(0) = fix) 



(3) 



where 6 is the angular distance from the cluster centre. Da the 
angular diameter distance, / the radial coordinate from the cen- 
tred of the cluster along the line of sight, crj the Thomson cross- 
section, nie the electron mass, c the speed of light, and the elec- 
tron pressure profile is given by = ngkBTe- For a given pres- 



sure profile Pe{r), the SZ temperature variation ISJsz can be 
written as 



ATsziO) = f{x)TcMB^Pf{(}), 



(4) 



where P'f{8) is the electron pressure profile projected in the sky 
given by 



, PelJp + e^Dl 



dl. 



(5) 



Here, the pressure profile is truncated in rout- lArnaud et al.l 
define an electron pressure pro file Pg, based on th e 
generalized Navarr o-Frenk- W hite (NEW, iNavarro etal] (Il997h ) 
model described bv lNagai"et al. (.20071) . This profile closely de- 
scribes the electron pressure profile obtained from X-ray data, 
and is given by 



Pg(r) = 1.65X 10"^/i(z)^/^ 



M500 



'70 



2/3+ap 



X p(x)hjQ keVcm ^, 



(6) 



where li{z) is given by li(z) - [Qm(l + z)^ + f^A]''^ and is the 
ratio of the Hubble constant at redshift z to its present value. 



Ho. Moreover, a,, 



0.12 and x - r/R^oo, where Rsoo is 



the radius within which the mean overdensity is 500 times the 
critical density of the universe at redshift z (fidz) = 2.775 x 
10^^ E^{z)h~MQMpc), M500 is the mass within the radius R500, 
given by 



4-n, 
T 



M500 = ^1500pc{z)]RIqo, 



p{x) corresponds to the generalized NEW model 
^ ^0 

(c5oox)y[i+(c5ooxrV''-y^/''' 

and the best-fit found bv lArnaud et al.l (1201 Ol) is given by 

[PQ,C5oo,7,a,/3] = [8.403/17o^^ 1.177,0.3081, 
1.0510,5.4905]. 

3. Description of simulated data 



(7) 



(8) 



(9) 



To perform a thorough testing of our method, we used two differ- 
ent sets of simulations. The first group is a "homemade" dataset 
composed of five components (CMB, SZ effect, synchrotron, 
free-free, and dust emission) at the frequencies of 100, 143, 217, 
353, and 545 GHz (five Planck HEI frequencies). The second 
group is a more realistic set of sky maps, including, in addition 
to the aforementioned components, point sources. These sim- 
ulated data were developed to test the data reduction pipeline 
for the Atacama Cosmology Telescope (ACT). These maps are 
available at the LAMBDAQ (Legacy Archive for Microwave 
Background Data Analysis) website. The details of our simu- 
lations are presented below, along with a summary of the second 
set of high-resolution sky simulations. 



http://lambda.gsfc.nasa.gov/toolbox/tb_cmbsim_ov.cfm 
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3. 1 . "Homemade" simulations 

Our goal was to generate simple synthetic maps to reproduce, in 
the simplest way for an outsider to the Planck Collaboration, the 
observations made by the Planck satellite. 

All maps were simulated using the HEALPix (Hierarchical 
Equal Area iso-L atitude Pixelization) pixelization grid 
dGorski et al.l l2005l) . The maps produced by Planck will 
have Nside = 2048, which means that each map will consist of 
Npix ~ 5 X 10^ pixels of size 1.7 arcmin. 

However, as the angular resolutions of the Planck instru- 
ments for the simulated frequencies at which we simulated the 
maps are between 10' and 4', it was unnecessary to simulate 
these maps with pixels of 1.7' in diameter, since this would be 
about of three times higher spatial resolution than Planck's. 

We therefore created maps with Nside = 1024 [Npix a; 1.2 x 
10^) and average angular diameters of 3.43', which have lower 
resolutions than the quoted figures for the Planck frequencies. 
Higher resolutions would enhance the SZ features of the profiles, 
but since we search for previously unidentified clusters, instead 
of studying the characteristics of the cluster profile, this does not 
add significantly to the search process. Moreover, it increases the 
processing time by a factor of ~ ^N^j^, with being the number 
of pixels in the map. This set of maps were constructed at the 
frequencies of 100, 143, 217, 353, and 545 GHz. A description 
of the components used in the simulated maps are presented in 
the following subsections. 

3.1.1. Cosmic microwave background anisotropies 

We performed our simulations of the temperature fluctuations in 
the CMB based on the C/ coe fficients created using the o nline 
interface of CMBFAST code (ISeliak & Zaldarriagal[T996l) . We 
considered the KCDM standard model with ~ 0, 27, Qa ~ 
0, 73, fifc/i^ ~ 0, 024, and h = 0, 72. From this spectrum, the field 
of CMB primary anisotropies of the whole sky was generated 
using the SYNFAST routine of the HEALPix package. Figure[T] 
shows the synthetic CMB map, in thermodynamic temperature. 

CMB map 




products" maps (Ijarosik et al.ll201 lb lOold et al.ll201 ih available 
at the LAMBDA website. However, the WMAP measurement 
frequencies are different from those used in this work and we 
had to scale the emission maps to the Planck frequencies, as- 
suming that they follow a power law with indexes estimated by 
the WMAP team. The intensity of each Galactic emission e, 
with spectral index Bg, depends on the frequency v according to 
(Bennett et al. ,2003i) 

/,(v)oc/«. (10) 

Since /^(vi) and Ie{v2) are the intensities of a given emission e at 
two different emission frequencies (vi and va), you can write the 
ratio of these intensities as 

^ Jrif ^;,(,0./.(v,)fcf. (11) 

hivi) \V2/ \V2/ 

Thus, we used a /y, map of a foreground component at a given 
frequency V2 and the corresponding spectral index, to obtain a 
synthetically scaled map /y, of emission e. 

The equation[TT]was applied, pixel-by-pixel, to maps of syn- 
chrotron, dust, and free-free emissions in W band (94 GHz). We 
did not scale the maps using a pixel-by-pixel fit but instead fixed 
spectral indices, where /J^ - -3,0, = 2,0, and/3// = -2, 16 
(Goldet al. 2011) for the three types of emission, respectively. 
Both the maps of Galactic emission and the spectral index val- 
ues used are part of the WMAP-7 products and results. 

3.1.3. Tine SZ effect 

The clusters were produced from the SZ temperature profiles 
coiTesponding to the generalized Navarro-Frenk- White model 
for the pressure profile of the intracluster gas, as described 
in Section |2] using the value r,,,,, - Rsoo for the integra- 
tion. We simulated 1000 synthetic clusters positioned through- 
out the sky and outside the Galactic region, with random ori- 
entations and following a uniform distribution. The tempera- 
ture profiles were constructed using an adaptation of the routine 
available at Eiichiro Komatsu's websitfl considering mass val- 
ues 5 X IO'^'Mq < M500 < 1 X IO'^Mq and a redshift interval 
3 X 10""* < z < 1.5. The resulting simulated maps, in the five 
selected frequencies, are shown in Fig. [2] A section around the 
north galactic pole was selected to provide a clearer view of the 
SZ signature at all of the five frequencies. 




-0.00047 ^^^^^^^K 0.00054 Kelvin 

Fig. 1. Cosmic microwave background anisotropy map in 
MoUweide projection. Galactic coordinates, and Kelvin temper- 
atures. 



3.1.2. Galactic emission 

The Galactic contribution to the synthetic maps was added us- 
ing the WMAP 7-year (hereafter WMAP-7) "derived foreground 



3.1.4. Symmetric and asymmetric noise 

The noise was simulated using the white noise sensitivities of 
each chosen Planck channel, which is given in thermodynamic 
CMB temperature units, estimated for the Planck mission (Tab. 
[TJ. The simulation was carried out to obtain a map of white 
Gaussian noise, by assuming both a roughly homogeneous cov- 
erage of the sky, and an asymmetric sky coverage mimicking a 
Planck observing scheme. 

In the first case (of homogeneous white Gaussian noise, here- 
after HWGN), we generated for each frequency, a Gaussian ran- 
dom distribution of zero mean and standard deviation given by 
the corresponding white noise sensitivity for 15 months of the 
mission. 

In the second case, the Planck-like noise due to the asymmet- 
ric sky coverage (hereafter NASC) was estimated using the same 

" http ://gyudon . as . utexas . edu/~komat su/CRL/index. html 
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SZ effect 100 GHz SZ effect 143 GHz SZ effect 217 GHz 




0.0 ^^^K. Q.OOOll K 0.0 ^^^K. 3.1e-0& K 

(OA 0«.t]) QslBaic {OA 0«.t]) QBlBaic 



Fig. 2. Gnomonic projection centered on the north Galactic pole and in Kelvin temperature, of SZ effect in 100, 143, 217, 353 and 
545 GHz. 



white noise sensitivities and a scaled version of the observation 
number (Nohs) map of WMAP-7, which is also available at the 
LAMBDA website. Since the most frequently observed regions 
by both satellites are the ecliptic poles, we constructed a Nobs 
map that we considered an acceptable approximation for Planck 
coverage. The WMAP-7 Nobs map was adapted to match Planck 
values and the "ring" effect around the ecliptic poles, which is 
not present in the Planck Nobs maps, was smoothed out. 

Using the above-mentioned N„bs map for 15 months and a 
Gaussian random distribution with zero mean and standard de- 
viation given by the Planck white noise sensitivity (in jjKs^^^), 
we created Planck-liks noise maps (NASC) for each frequency. 



3.1 .5. Construction of a simulated Planck sky 

Using the components described above, we produced a "home- 
made" Planck sky. The maps were produced at 100, 143, 217, 
353, and 545 GHz from a combination of maps of CMB, SZ 
effect, synchrotron, dust, and free-free emissions, together with 
noise, added with equal weights, as described in Equation[T2l 



X^-J]xJ, (12) 
1=1 



Table 1. Ch aracteristics of the P l anck satellite instruments 
(adapted from lPlanck Collaboration! (1201 ih ). 



Frequency (GHz) 


100 


143 


217 


353 


545 


FWHM (arcmin) 


9,37 


7,04 


4.68 


4.43 


3.80 


Sensitivity " (jiKcMB-'^''') 


22.6 


14.5 


20.6 


77.3 


1011.3 * 



" Uncorrelated noise in 1 s for the corresponding array of detec- 
tors in each frequency. 

* Obtained from the extrapolation of lower frequencies. 



where xj is the map of the component (emission) / at a given 
frequency v and X'' is the resulting map of the linear combina- 
tion of Nc components. Each frequency map was convolved with 
the corresponding beam, using the full width at half maximum 
(FWHM) values for the Planck channels (Tab.[T]i, then a realiza- 
tion of the noise was added to each map. 

The SZ clusters were randomly placed all over the sky 
and we used the WMAP-7 KQ75 mask (also available at the 
LAMBDA website) to remove the Galactic plane neighbour- 
hood, as usual in any CMB analysis. The resulting maps for the 
five Planck HFI frequencies are shown in Fig.|3] 
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CMB, SZ effect, synchrotron, dust, free-free and noise - 100 GHz CMB, SZ effect, synchrotron, dust, free-free and noise - 143 GHz 





CMB, SZ effect, synchrotron, dnst, free-free and noise - 317 GHz 



CMB, SZ effect, synchrotron, dust, free— free and r 




CMB, SZ effect, synchrotron, dust, free-free and noise - 545 GHs 




Fig. 3. Linear combination of CMB, SZ effect, Galactic emission (synchrotron, dust, and free-free) and HWGN maps. The unit of 
the maps is K, in Galactic coordinates and Mollweide projection. 



3.2. High-resolution full-sky simulations 



This second set of simulated maps was downloaded from the 
LAMBDA website. They have a Nside = 8192 pixelization, cor- 
responding to a resolution of 0.4 arcminutes in six different fre- 
quencies: 148, 219, and 277 GHz (the ACT observing frequen- 
cies) and the additional 30, 90, and 350 GHz, close to the Planck 
frequencies on the LFI and HFI. The maps are made of (1) the 
CMB affected by the lensing of an intervening structure between 
the last scattering surface and observers today; (2) the thermal 
and kinetic SZ effects, plus higher-order relativistic corrections, 
from galaxy clusters, groups, and the intergalactic medium; (3) 
a population of dusty star-forming galaxies that emit strongly at 
infrared (IR) wavelengths but still have significant microwave 
emission; (4) a population of galaxies, including active galactic 
nuclei, that emit strongly at radio wavelengths but still have sig- 
nificant microwave emission, and (5) the foreground emission 
of our own galaxy (dust, synchrotron, and free-fre e). A detailed 
explan ation of these simulations can be found in ISehgal et alj 
(120101) . 



The catalogue of SZ halos and both IR and radio galaxies 
included in these simulations are also available at the LAMBDA 
website. The SZ catalogue contains 1414339 objects in the first 
octant, which are mirrored across the complete celestial sphere. 
The mass range is 2 x IO^Mq < M500 < 1.5 x IO'^Mq, with 
redshifts in the range < z < 3. 

The simulated sky maps available at LAMBDA (hereafter 
LAMBDA maps) have a very fine resolution, and to avoid a very 
large computation time in analysing them, we re-pixelized them 
from Nside - 8192 to N^ide - 2048. We convolved the lower- 
resolution maps with Gaussian beams with a FWHM extrapo- 
lated from the Planck values (see Tab.|2]i, and then added noise. 
Following the same procedure used in our "homemade" simula- 
tions, two kinds of noise maps were used. One contained plain 
white Gaussian noise with a uniform coverage per pixel. The 
second considered an asymmetric sky coverage, which was iden- 
tical to the one described in Section [3T4l (HWGN and NASC), 
but for which we used the white noise sensitivities given by the 
extrapolation of Planck values in Tab.[T] These values are shown 
in Tab. E] 
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Table 2. Sensitivities extrapolated from Planck frequencies. 



Frequency (GHz) 


30 


90 


148 219 


277 


350 t 


FWHM (arcmin) 


32.65 


9.42 


6.73 4.66 


4.43 


4.44 


Sensitivity (jiKcMss'^'^) 


146.8 


25.7 


14.2 20.9 


32.5 


74.3 1 



4. Separation of components 

A CMB data set contains a combination of signals from many 
sources. The most significant come from our galaxy, the CMB 
itself, the SZ effect, and radio/IR point sources. Electronic noise 
is also produced by the detector and associated electronics. This 
section describes the method used to distinguish between the sig- 
nals from these various components. 

Our method is based on a numerical algorithm called the 
Joint Approximate Dia gonaliza tion of Ei^enm atrices (JADE) 
(ICardoso & Souloumiac 1993: Cardoso! 1 19991) based on inde- 
pendent component analysis (ICA) (iHvvarinem & Oial l2000i 



see, e.g.,) and effective in extracting non-Gaussian components, 
as in the case of the SZ effect. We highlight its most interesting 
feature, that of not using any "prior" information about the input 
components. This featur e sorts JADE from other methods used 
in the CMB/SZ analysis (iLeachet al.ll2008h . 

The original JADE code is inefficient in the presence of noise 
and we introduced a wavelet pre-cleaning method prior to feed- 
ing the data to JADE. After component separa tion, we used 
the SExtracto]0 package (iBertin & Arnoutslll996h to detect and 
identify the positions and intensities of the clusters. We describe 
below the steps of our pipeline, from the initial data preparation 
to the elaboration of the final catalogue of cluster candidates. 

The implementation of our pipeline was done fully in IDL 
(Interactive Data Language) for a number of reasons. First, this 
environment is very popular in the astronomy community, sec- 
ond, it is one of the languages used by the HEALpix package 
and, third, it is the chosen language of the CMB community for 
image processing. We modified the JADE routine available at 
MRSQ (Multi-Resolution on the Sphere) package, including a 
pre-whitening, wavelet-based, and processing step described in 
the next section. 

The processing time for the full cluster identification pipeline 
(pre-whitening+JADE+SExtractor) was ~18 minutes for the 
"homemade" simulations and ~ 1.2h for the LAMBDA maps. 
Figure |4] summarizes the data flow in our pipeline. 

4.1. Noise filtering 

The presence of noise requires some sort of pre-processing to 
permit JADE to deal with the data. This pre-processing starts by 
wavelet-transforming each map. The transformation in wavelet 
space retains the information contained in the pixels while aver- 
aging the noise co ntribution and highlighting the data structures 
dPires et al.ll2006l) . 

We used the Daubechies wavelet transform to remove the 
noise from the data. The reasons for choosi ng this wavelet fam- 
ily are comprehensively discussed in, e.g.. iTorrence & Compol 
(Il998h . After conducting several tests, varying the order and 
level of the wavelet transform applied to the data, and compar- 
ing the results obtained with JADE in each test, we conclude that 
the best choices for this dataset is an order N = 3 (db3) and a 



decomposition level « = 5. It is important to remember that the 
higher the level used in the transformation, the more noise-free 
the data. 

However, our various runs show there is an optimal decom- 
position level, above which a kind of "saturation" occurs. When 
starting the denoising process, it is advisable to perform a few 
tests to verify the optimal level for a given dataset. 

After the transformation of the data to the wavelet space, we 
filtered the maps with the HEALPix smoothing. f90 routine, us- 
ing Gaussian beams with FWHM=8' for our "homemade" sim- 
ulations and FWHM=3' for the LAMBDA set to minimize the 
noise level prior to the application of JADE. 

It is important to stress, at this point, that no previous infor- 
mation about the input data is used. This means that the cluster 
shape, mass thresholds, or redshift information, for instance, are 
not taken into account. Our wavelet tests are based solely upon 
the spectral information contained in the data. 

4.2. Sorting input signals: The JADE algorithm 

Many methods developed for signal separation are based on 
ICA, and can be considered a class known as blind source sep- 
aration (BSS) problems. A typical example of BSS is the pro- 
cessing of multidimens ional data with no "a priori" information 
(iHvvarinen et al.ll200lh . 

This problem consists primarily of retrieving a set of m 
statistically independent signals from m mixtures of these in- 
stantly observed signals (see, e.g.. lCardoso & Souloumiad 19931: 
Cardoso 1994). In other words, the goal is to estimate the matrix 
of the sources (independent components), S , and the mixing ma- 
trix, A, from X, the matrix of linear combinations of individual 
sources. This mixture model is described by the equation 



X = AS, 



(13) 



^ http://www.astromatic.net/software/sextractor 
"* http://irfu.cea.fr/en/Phocea/Vie_des_labos/Ast/ast_visu.php7id_ast 
=895 



where X is one m x T matrix, T the number of observed samples 
(each row is a mixture of m sources of a specific frequency), S is 
amxT matrix (each row is the signal from a particular source), 
and A is a m x m invertible matrix, which specifies the original 
signal contributions of S to X. 

It is important t o warn the user of som e shortcomings and 
limitations of ICA ( IHvvarinen et al.ll2001h . First, ICA assumes 
that the independent components are statistically independent. 
Second, at least one of the independent components must come 
from a non-Gaussian distribution, because Gaussian distribu- 
tions have higher-order cumulants equal to zero, which mean 
that the ICA model cannot be applied. Finally, for the sake of 
simplicity, the model assumes that the mixed matrix is square, 
i.e., the number of independent components equals the number 
of observed mixtures. However, this is not a mandatory con- 
dition for using ICA; for details, we refer to IHvvarinen et al.l 
(I2001h . 

In addition to these limitations, the ICA method does not 
return the actual amplitudes of signals, since these are initially 
unknown. However, this is not a major problem, since the signal 
can be recalibrated after the separation of the components. This 
issue is discussed in Section 14.31 In addition, the method does 
not allow the user to determine the sequential ordering of the 
independent components in the S matrix rows, so the ordering 
can be freely changed. 

Originally introduced by ICardoso & Souloumiad (1 19931) . 
JADE is a statistical, ICA-based, technique that relies on high- 
order statistics. Its mixture model is given by Equation [T3l and 
assumes that the resulting sources in S are non-Gaussian random 
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Fig. 4. Block diagram summarizing the SZ detection pipeline for simulated maps. 



processes with a high signal-to-noise ratio. Since a real noise- 
free map does not exist, there is a need for data pre-processing 
before applying this method. 

We now describe the data processing steps used by 
JADE to obtain the in dependent components (i.e. the sources) 
(iHvvarinen et al.ll200lh . The method starts by centralizing data, 
assuming that both the mixture variables and the independent 
components have zero means, and it then performs a whitening 
of the observed signals. For the model described by Equation 
[T3I the whitening of X is carried out by the whitening matrix Y 
(the inverse of the square root of the covariance matrix of the 
data), generating the white vector Z = YX - YAS . We then 
compute a new orthogonal mixing matrix — YA and a new 
separation matrix W dHvvarinen et al. 2 00T1 : iPires et al. 20()6j). 
The ICA Equation[T3]becomes Z - W^S , after the whitening of 
the data. 

The cumulant tensor of the whitened matrix Z has a special 
structure, which can be seen from the eigenvalue decomposition, 
that accounts for the independent components. To achieve this. 



the whole matrix is assumed to have the form M - Wmwj^, (para 
m - 1, «) which is an eigenmatrix of the cumulant tensor 

F,/M) = AMij = ^ WmkWmi cum{zi, z j, Zk, Zi), (14) 
ki 

where w„, is a row of the W matrix and A is the eigenvalue. Since 
the eigenvalues are distinct from each other, each eigenmatrix 
corresponds to an eigenvalue of the form w,„w^,, giving one of 
the rows of W. Thus, with knowledge of the eigenmatrices of the 
cumulant tensor it is possible to obtain the independent compo- 
nents. JADE was designed to solve the case for indistinguishable 
eigenvalues. 

According to lHvvarinen et al.l ( 1200 lb . the eigenvalue decom- 
position can also be understood as a diagonalization process. 
Hence, the eigenvalue decomposition is also a diagonalization 
of the cumulant tensor F(M) that is performed by multiplying 
the matrix W for any M, as 

Q = WF(Mi)W^. (15) 
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Thus, Mi is chosen such that Q is as diagonal as possible. 

Since the W matrix is orthogonal, its multiplication by an- 
other matrix does not change the total sum of squares of the ele- 
ments of this matrix, thus minimizing the sum of the squares of 
the off-diagonal elements is equivalent to maximizing the sum of 
the squares of the diagonal elements. Thus, this algorithm aims 
to maximize the equation 



cator of accuracy of our calibration method for the JADE output 
maps. We also estimate the average dispersion D of the data 



D 



-Z 



y-' 



(17) 



obtaining 0.27. 



3yAD£(W) = Yj \\diag(WF{Mi)W^)f. (16) 

The maximization of 3 jade is a method of the joint approxi- 
mate diagonalization of F(M,). The M, matrices are chosen from 
the eigenmatrices of the cumulant tensor, which provide all rel- 
evant information about the cumulants because they share the 
same space as the cumulant tensor 

The A matrix is obtained by applying JADE to the data in 
wavelet space. Multiplying its inverse (A"') by X, we obtain the 
S matrix of components. This result can be achieved because the 
application of wavelet transform does not affect the A matrix, 
but only increases the accuracy of the calculation. Since the A 
matrix was carefully calculated, it was applied to the input data 
to extract the SZ map. 

Figure |5] shows an example of the extraction of an SZ map, 
obtained from the analysis of the "homemade" input maps with 
HWGN. It can be seen from these results that the temperature 
scale of the recovered SZ map does not match the scale of the 
simulated maps, since JADE loses the calibration information 
during data processing. The next section discusses the process 
of calibration recovery for each frequency. 

4.3. Recovering calibration 

The appropriate method for calibrating the recovered map is de- 
rived from an initial analysis comparing the output map to the 
input map to see how the fluxes of known clusters or other po- 
tential calibrators change in each map region. We found that the 
intensity of the recovered map by JADE differs from the input 
data by a nearly constant value across the whole map. Since we 
do not deal with real data, no known sources can be used to re- 
construct the calibration, so we used our fake input clusters to 
accomplish this task. In a real map, however, prior knowledge of 
the fluxes of a few well-known sources allows the calibration for 
the full map to be made. 

We took the clusters' central values ATsz in the input and 
output maps and calculated the ratio of both for each selected 
cluster The average value of these ratios, at each of the frequen- 
cies, is the value by which the recovered map is multiplied to re- 
cover the calibration. In this work, we used 50 clusters randomly 
chosen to perform the procedure, since the larger the number of 
clusters used in the calibration, the more accurate the result. 

A section of the map in Fig.|5]calibrated for each frequency is 
shown in Fig.|6] which can be inspected and visually compared 
to the same section of the simulated map (Fig. |2]i to check for 
large differences in the temperature scales. 

We proceeded to compare the y values of the profile ampli- 
tude (the central value) calculated from both the simulated and 
calibrated ATsz values of the clusters, shown in Fig. I?] In this 
graph, each point is equivalent to a single cluster and the diago- 
nal lines represent the "equality line". The closer the point to the 
diagonal, the closer the input (simulated) and output (recovered 
and calibrated) y values. Thus, the plot in Fig.|7]is a good indi- 



5. Cluster detection 

After recovering the clusters from a "full sky with noise" 
using JADE, and recalibrating their fluxes, we used the 
SExtractor package (iBertin & ArnoutsI [19961 iHolwerdal r2005h 
to select the cluster candidates. The most important SExtractor 
parameters, and those that most strongly influence the 
results, are DETECT.THRESH (the detection threshold), 
DETECT_MINAREA (which sets the minimum number 
of pixels above the threshold triggering detection), and 
FILTER_NAME (which selects the file containing the filter def- 
inition). 

SExtractor offers a number of filters to be used in this kind 
of analysis, that have various FWHM and sizes (both given 
in pixels). For our analysis, the filter that most closely recov- 
ered the input data was the Gaussian filter. For the "home- 
made" datasets, we used a FWHM of 4 pixels and a mask with 
7x7 pixels. For the LAMBDA dataset, the values were, respec- 
tively, 2 and 5x5 pixels. In addition, we used threshold val- 
ues of 2.5cr, 1.5cr, 2. Oct, and 2.0cr for the "homemade" datasets 
+ HWGN, the "homemade" datasets + NASC, the LAMBDA 
datasets + HWGN, and the LAMBDA datasets + NASC, re- 
spectively. DETECT_MINAREA was set equal to 4 and 8 for 
the "homemade" and LAMBDA datasets. 

We compare the positions of the calibrated clusters found by 
SExtractor with those included in the input sky maps, to account 
for false detections. The criterion used to make that determina- 
tion was that for each position of cluster candidate indicated by 
SExtractor we checked for the existence of clusters in a circle, 
of radius equal to three pixels, around that position. If there was 
no cluster in the region, the candidate was considered a false de- 
tection. If there was more than one, the most massive cluster was 
assumed to be the detection, since it is more likely that one finds 
the most massive cluster At this point, we did not consider the 
possibility of multiple detections. 

Our results obtained from the analysis of both datasets and 
noise types are summarized in Tab. [3] which shows the number 
of cluster candidates indicated by SExtractor, the number of con- 
firmed clusters, and finally both the purity and completeness of 
the recovered "catalogue". The Figs. [8] and |9] present the com- 
pleteness by redshift and mass interval for each dataset. The first 
one shows that the completeness does not change significantly 
with redshift, which highlights the redshift independence of the 
SZ effect, as expected. The second one shows the sensitivity of 
the SZ effect to the mass of the cluster, the completeness in- 
creasing with increasing mass. It can also be seen from these 
figures that the different noise models led to slightly different re- 
sults, implying that one has to test the pipeline parameters to find 
the most appropriate filtering scheme (with respect to instrumen- 
tal properties such as beam size and expected noise level) for a 
given dataset. 

There is a considerable difference between the two datasets 
used in this work, which allowed us to test and evaluate the 
pipeline under two very different conditions. 

The total completeness obtained from the analysis of our 
simple "homemade" simulations provide a first indication of 



8 



Novaes and Wuensche: Identification of galaxy clusters in CMB maps using the Sunyaev-Zel'dovich effect 



SZ effect recovered 




-0.61 ^^^^^^K, .^^^^^m 3.S 



Fig. 5. Our SZ-efFect map recovered with JADE algorithm from the analysis of our "homemade" maps contaminated by HWGN. 
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Fig. 6. Gnomonic projection centred on the north Galactic pole, of the SZ map in Fig. |5] ("homemade" + HWGN result) when 
calibrated for each input frequency. 
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Table 3. Results for both datasets. 



Input maps 


Found clusters 


Confirmed clusters 


Purity" (%) 


Completeness* (%) 


"Homemade" + HWGN 


766 


725 


94.6 


72.5 


"Homemade" + NASC 


813 


804 


98.9 


80.4 


LAMBDA + HWGN 


3928 


3873 


98.6 


0.034 


LAMBDA + NASC 


3560 


3550 


99.7 


0.031 



purity = true detections I total detection 
completeness = true detections I simulated clusters 




0.0000 0,0001 0,0003 0,0003 0,0004 0.0005 

Simulated y 

Fig. 7. Graph of simulated and calibrated y parameters, obtained 
analysing our "homemade" simulation with HWGN. The diago- 
nal straight is the line of equality. 

the efficiency of the method. However, it is insufficient, since 
it does not account for other contaminants of the SZ signal, such 
as the cross-terms of the thermal and kinetic SZ effects, radio 
sources, unresolved SZ clusters, and the SZ background. A more 
thorough testing was done by processing the LAMBDA maps 
through the pipeline. Despite the large differences between both 
datasets, we obtained a very similar result. 

Another point worth mentioning is that, in this work, it 
makes no sense to discuss the total completeness of the result- 
ing catalogue, since there is a very large number of clusters dis- 
tributed over the full sky, most of them well below 5 x 10''' Mq, 
and so unresolved via the SZ signal. Nevertheless, the results 
obtained using the LAMBDA maps presented a level of purity 
above 90%. The completeness of the mass and redshift intervals 
behave as expected, reaching a very low levels at lower masses 
and clearly increasing towards tens of percent for masses above 
5 X lO'^Mo. 

6. Concluding remarks 

We have presented our implementation of a method to identify 
galaxy clusters using the SZ effect in CMB observations. We 
have adapted JADE, a publicly available algorithm, to deal with 
noisy data by applying a pre-whitening, wavelet-based process 
and added a source detection package (SExtractor) at the end of 
our pipeline. 

We have found the most attractive feature of this method is 
that it is based on a blind search algorithm, i.e., its application 
really does not require any a priori information about the targets. 
The essential contributions of this work were the following: 

1) The wavelet-based analysis tool has been adapted to per- 
form the initial cleaning of the input data. Since JADE was de- 
signed to perform in the absence of noise, data preprocessing 
was essential to ensure the efficient performance of our algo- 
rithm. 



2) A parameter set was determined for the full pipeline 
(wavelet tool, JADE, and SExtractor) that delivered catalogues 
from two simulated datasets with a level of purity (ratio of con- 
firmed clusters to total detected clusters) above 90%. 

Our method is a complementary approach to the MF algo- 
rithm currently used by the Planck, SPT and ACT collabora- 
tions ( iPlanck Collaborationll20Tll: IStorv et aUbOl U iHand et al.l 
and can be used as a redundant tool in their data anal- 
ysis pipeline. The results of using MF in CMB data analysis 
are widely described in the literature, hence we perform no fur- 
ther testing here. Our goal is to describe an alternative (and use- 
ful) technique to identify SZ clusters with no prior assumptions 
about the input data and under very different input conditions. 
We do not intend to perform a direct comparison between the 
two methods. 

We have developed a full pipeline^ which is represented 
in the block diagram of Fig. |4] and can be summarized in four 
main steps; data preprocessing (de-noising) based on a wavelet 
tool, separation of components (emissions) by JADE, calibra- 
tion of the recovered SZ map, and the identification of the posi- 
tions and intensities of the clusters using the SExtractor package. 
Two simulated datasets were run through this pipeline: a "home- 
made" set and the more complete LAMBDA dataset, which were 
both described in Sections [3. ll and lJSl 

The results presented in Tab. [3] of Section [5] indicate that 
our method performed very efficiently for both datasets. They 
vary slightly according to the characteristics of the data, espe- 
cially in terms of the noise characteristics, and we caution that 
the whole pipeline may perform differently when applied to real 
data. Thus, the application of our method to real data may re- 
quire some adjustment in the preprocessing phase to determine 
the optimal parameters for the denoising and target extraction, 
as discussed in Sections [4.11 andlSl 
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Fig. 8. Relation between completeness and redshift intervals for 
recovered SZ catalogues. The graphics from top to down cor- 
respond to results of the analysis of "homemade" -i- HWGN, 
"homemade" + NASC, LAMBDA -i- HWGN and LAMBDA + 
NASC, respectively. 
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Fig. 9. Relation between the completeness and M500 intervals for 
the recovered SZ catalogues. The graphics from top to down cor- 
respond to the results of the analysis of "homemade" + HWGN, 
"homemade" + NASC, and LAMBDA + HWGN and LAMBDA 
+ NASC datasets, respectively. 
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